Overview

Dataset statistics

Number of variables17
Number of observations21613
Missing cells0
Missing cells (%)0.0%
Duplicate rows5
Duplicate rows (%)< 0.1%
Total size in memory2.8 MiB
Average record size in memory136.0 B

Variable types

NUM16
BOOL1

Reproduction

Analysis started2022-03-16 07:39:43.981313
Analysis finished2022-03-16 07:40:42.515140
Duration58.53 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 5 (< 0.1%) duplicate rows Duplicates
view has 19489 (90.2%) zeros Zeros
sqft_basement has 13126 (60.7%) zeros Zeros

Variables

price
Real number (ℝ≥0)

Distinct count4028
Unique (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean540088.1417665294
Minimum75000.0
Maximum7700000.0
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum75000
5-th percentile210000
Q1321950
median450000
Q3645000
95-th percentile1156480
Maximum7700000
Range7625000
Interquartile range (IQR)323050

Descriptive statistics

Standard deviation367127.1965
Coefficient of variation (CV)0.6797542255
Kurtosis34.58554043
Mean540088.1418
Median Absolute Deviation (MAD)150000
Skewness4.024069145
Sum1.167292501e+10
Variance1.347823784e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
350000 172 0.8%
 
450000 172 0.8%
 
550000 159 0.7%
 
500000 152 0.7%
 
425000 150 0.7%
 
325000 148 0.7%
 
400000 145 0.7%
 
375000 138 0.6%
 
300000 133 0.6%
 
525000 131 0.6%
 
Other values (4018) 20113 93.1%
 
ValueCountFrequency (%) 
75000 1 < 0.1%
 
78000 1 < 0.1%
 
80000 1 < 0.1%
 
81000 1 < 0.1%
 
82000 1 < 0.1%
 
ValueCountFrequency (%) 
7700000 1 < 0.1%
 
7062500 1 < 0.1%
 
6885000 1 < 0.1%
 
5570000 1 < 0.1%
 
5350000 1 < 0.1%
 

bedrooms
Real number (ℝ≥0)

Distinct count13
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.37084162309721
Minimum0
Maximum33
Zeros13
Zeros (%)0.1%
Memory size169.0 KiB

Quantile statistics

Minimum0
5-th percentile2
Q13
median3
Q34
95-th percentile5
Maximum33
Range33
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9300618311
Coefficient of variation (CV)0.2759138325
Kurtosis49.06365318
Mean3.370841623
Median Absolute Deviation (MAD)1
Skewness1.974299535
Sum72854
Variance0.8650150098
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3 9824 45.5%
 
4 6882 31.8%
 
2 2760 12.8%
 
5 1601 7.4%
 
6 272 1.3%
 
1 199 0.9%
 
7 38 0.2%
 
0 13 0.1%
 
8 13 0.1%
 
9 6 < 0.1%
 
Other values (3) 5 < 0.1%
 
ValueCountFrequency (%) 
0 13 0.1%
 
1 199 0.9%
 
2 2760 12.8%
 
3 9824 45.5%
 
4 6882 31.8%
 
ValueCountFrequency (%) 
33 1 < 0.1%
 
11 1 < 0.1%
 
10 3 < 0.1%
 
9 6 < 0.1%
 
8 13 0.1%
 

bathrooms
Real number (ℝ≥0)

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.7497339564151206
Minimum0
Maximum8
Zeros86
Zeros (%)0.4%
Memory size169.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7348730839
Coefficient of variation (CV)0.4199913257
Kurtosis1.989574489
Mean1.749733956
Median Absolute Deviation (MAD)1
Skewness0.9021053898
Sum37817
Variance0.5400384495
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 10542 48.8%
 
1 8355 38.7%
 
3 2228 10.3%
 
4 338 1.6%
 
0 86 0.4%
 
5 48 0.2%
 
6 12 0.1%
 
8 2 < 0.1%
 
7 2 < 0.1%
 
ValueCountFrequency (%) 
0 86 0.4%
 
1 8355 38.7%
 
2 10542 48.8%
 
3 2228 10.3%
 
4 338 1.6%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
7 2 < 0.1%
 
6 12 0.1%
 
5 48 0.2%
 
4 338 1.6%
 

sqft_living
Real number (ℝ≥0)

Distinct count1038
Unique (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2079.8997362698374
Minimum290
Maximum13540
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum290
5-th percentile940
Q11427
median1910
Q32550
95-th percentile3760
Maximum13540
Range13250
Interquartile range (IQR)1123

Descriptive statistics

Standard deviation918.440897
Coefficient of variation (CV)0.4415794093
Kurtosis5.24309299
Mean2079.899736
Median Absolute Deviation (MAD)540
Skewness1.471555427
Sum44952873
Variance843533.6814
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1300 138 0.6%
 
1400 135 0.6%
 
1440 133 0.6%
 
1800 129 0.6%
 
1660 129 0.6%
 
1010 129 0.6%
 
1820 128 0.6%
 
1480 125 0.6%
 
1720 125 0.6%
 
1540 124 0.6%
 
Other values (1028) 20318 94.0%
 
ValueCountFrequency (%) 
290 1 < 0.1%
 
370 1 < 0.1%
 
380 1 < 0.1%
 
384 1 < 0.1%
 
390 2 < 0.1%
 
ValueCountFrequency (%) 
13540 1 < 0.1%
 
12050 1 < 0.1%
 
10040 1 < 0.1%
 
9890 1 < 0.1%
 
9640 1 < 0.1%
 

sqft_lot
Real number (ℝ≥0)

Distinct count9782
Unique (%)45.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15106.967565816869
Minimum520
Maximum1651359
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum520
5-th percentile1800
Q15040
median7618
Q310688
95-th percentile43339.2
Maximum1651359
Range1650839
Interquartile range (IQR)5648

Descriptive statistics

Standard deviation41420.51152
Coefficient of variation (CV)2.741815082
Kurtosis285.0778197
Mean15106.96757
Median Absolute Deviation (MAD)2618
Skewness13.06001896
Sum326506890
Variance1715658774
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5000 358 1.7%
 
6000 290 1.3%
 
4000 251 1.2%
 
7200 220 1.0%
 
4800 120 0.6%
 
7500 119 0.6%
 
4500 114 0.5%
 
8400 111 0.5%
 
9600 109 0.5%
 
3600 103 0.5%
 
Other values (9772) 19818 91.7%
 
ValueCountFrequency (%) 
520 1 < 0.1%
 
572 1 < 0.1%
 
600 1 < 0.1%
 
609 1 < 0.1%
 
635 1 < 0.1%
 
ValueCountFrequency (%) 
1651359 1 < 0.1%
 
1164794 1 < 0.1%
 
1074218 1 < 0.1%
 
1024068 1 < 0.1%
 
982998 1 < 0.1%
 

floors
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4943089807060566
Minimum1.0
Maximum3.5
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1.5
Q32
95-th percentile2
Maximum3.5
Range2.5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.5399888951
Coefficient of variation (CV)0.361363615
Kurtosis-0.4847229368
Mean1.494308981
Median Absolute Deviation (MAD)0.5
Skewness0.6161767212
Sum32296.5
Variance0.2915880069
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 10680 49.4%
 
2 8241 38.1%
 
1.5 1910 8.8%
 
3 613 2.8%
 
2.5 161 0.7%
 
3.5 8 < 0.1%
 
ValueCountFrequency (%) 
1 10680 49.4%
 
1.5 1910 8.8%
 
2 8241 38.1%
 
2.5 161 0.7%
 
3 613 2.8%
 
ValueCountFrequency (%) 
3.5 8 < 0.1%
 
3 613 2.8%
 
2.5 161 0.7%
 
2 8241 38.1%
 
1.5 1910 8.8%
 

waterfront
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size169.0 KiB
0
21450
1
 
163
ValueCountFrequency (%) 
0 21450 99.2%
 
1 163 0.8%
 

view
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.23430342849211122
Minimum0
Maximum4
Zeros19489
Zeros (%)90.2%
Memory size169.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7663175693
Coefficient of variation (CV)3.270620384
Kurtosis10.89302168
Mean0.2343034285
Median Absolute Deviation (MAD)0
Skewness3.395749593
Sum5064
Variance0.587242617
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 19489 90.2%
 
2 963 4.5%
 
3 510 2.4%
 
1 332 1.5%
 
4 319 1.5%
 
ValueCountFrequency (%) 
0 19489 90.2%
 
1 332 1.5%
 
2 963 4.5%
 
3 510 2.4%
 
4 319 1.5%
 
ValueCountFrequency (%) 
4 319 1.5%
 
3 510 2.4%
 
2 963 4.5%
 
1 332 1.5%
 
0 19489 90.2%
 

condition
Real number (ℝ≥0)

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4094295100171195
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum1
5-th percentile3
Q13
median3
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.6507430464
Coefficient of variation (CV)0.1908656696
Kurtosis0.5257635653
Mean3.40942951
Median Absolute Deviation (MAD)0
Skewness1.032804637
Sum73688
Variance0.4234665124
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3 14031 64.9%
 
4 5679 26.3%
 
5 1701 7.9%
 
2 172 0.8%
 
1 30 0.1%
 
ValueCountFrequency (%) 
1 30 0.1%
 
2 172 0.8%
 
3 14031 64.9%
 
4 5679 26.3%
 
5 1701 7.9%
 
ValueCountFrequency (%) 
5 1701 7.9%
 
4 5679 26.3%
 
3 14031 64.9%
 
2 172 0.8%
 
1 30 0.1%
 

grade
Real number (ℝ≥0)

Distinct count12
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.656873178179799
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum1
5-th percentile6
Q17
median7
Q38
95-th percentile10
Maximum13
Range12
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.175458757
Coefficient of variation (CV)0.1535168116
Kurtosis1.190932077
Mean7.656873178
Median Absolute Deviation (MAD)1
Skewness0.7711032008
Sum165488
Variance1.381703289
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7 8981 41.6%
 
8 6068 28.1%
 
9 2615 12.1%
 
6 2038 9.4%
 
10 1134 5.2%
 
11 399 1.8%
 
5 242 1.1%
 
12 90 0.4%
 
4 29 0.1%
 
13 13 0.1%
 
Other values (2) 4 < 0.1%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
3 3 < 0.1%
 
4 29 0.1%
 
5 242 1.1%
 
6 2038 9.4%
 
ValueCountFrequency (%) 
13 13 0.1%
 
12 90 0.4%
 
11 399 1.8%
 
10 1134 5.2%
 
9 2615 12.1%
 

sqft_above
Real number (ℝ≥0)

Distinct count946
Unique (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1788.3906907879516
Minimum290
Maximum9410
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum290
5-th percentile850
Q11190
median1560
Q32210
95-th percentile3400
Maximum9410
Range9120
Interquartile range (IQR)1020

Descriptive statistics

Standard deviation828.0909777
Coefficient of variation (CV)0.4630369538
Kurtosis3.402303621
Mean1788.390691
Median Absolute Deviation (MAD)450
Skewness1.446664473
Sum38652488
Variance685734.6673
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1300 212 1.0%
 
1010 210 1.0%
 
1200 206 1.0%
 
1220 192 0.9%
 
1140 184 0.9%
 
1400 180 0.8%
 
1060 178 0.8%
 
1180 177 0.8%
 
1340 176 0.8%
 
1250 174 0.8%
 
Other values (936) 19724 91.3%
 
ValueCountFrequency (%) 
290 1 < 0.1%
 
370 1 < 0.1%
 
380 1 < 0.1%
 
384 1 < 0.1%
 
390 2 < 0.1%
 
ValueCountFrequency (%) 
9410 1 < 0.1%
 
8860 1 < 0.1%
 
8570 1 < 0.1%
 
8020 1 < 0.1%
 
7880 1 < 0.1%
 

sqft_basement
Real number (ℝ≥0)

ZEROS
Distinct count306
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean291.5090454818859
Minimum0
Maximum4820
Zeros13126
Zeros (%)60.7%
Memory size169.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3560
95-th percentile1190
Maximum4820
Range4820
Interquartile range (IQR)560

Descriptive statistics

Standard deviation442.5750427
Coefficient of variation (CV)1.518220616
Kurtosis2.715574211
Mean291.5090455
Median Absolute Deviation (MAD)0
Skewness1.577965056
Sum6300385
Variance195872.6684
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 13126 60.7%
 
600 221 1.0%
 
700 218 1.0%
 
500 214 1.0%
 
800 206 1.0%
 
400 184 0.9%
 
1000 149 0.7%
 
900 144 0.7%
 
300 142 0.7%
 
200 108 0.5%
 
Other values (296) 6901 31.9%
 
ValueCountFrequency (%) 
0 13126 60.7%
 
10 2 < 0.1%
 
20 1 < 0.1%
 
40 4 < 0.1%
 
50 11 0.1%
 
ValueCountFrequency (%) 
4820 1 < 0.1%
 
4130 1 < 0.1%
 
3500 1 < 0.1%
 
3480 1 < 0.1%
 
3260 1 < 0.1%
 

yr_built
Real number (ℝ≥0)

Distinct count116
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1971.0051357978994
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum1900
5-th percentile1915
Q11951
median1975
Q31997
95-th percentile2011
Maximum2015
Range115
Interquartile range (IQR)46

Descriptive statistics

Standard deviation29.3734108
Coefficient of variation (CV)0.01490275711
Kurtosis-0.6574075047
Mean1971.005136
Median Absolute Deviation (MAD)23
Skewness-0.4698053988
Sum42599334
Variance862.7972622
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2014 559 2.6%
 
2006 454 2.1%
 
2005 450 2.1%
 
2004 433 2.0%
 
2003 422 2.0%
 
2007 417 1.9%
 
1977 417 1.9%
 
1978 387 1.8%
 
1968 381 1.8%
 
2008 367 1.7%
 
Other values (106) 17326 80.2%
 
ValueCountFrequency (%) 
1900 87 0.4%
 
1901 29 0.1%
 
1902 27 0.1%
 
1903 46 0.2%
 
1904 45 0.2%
 
ValueCountFrequency (%) 
2015 38 0.2%
 
2014 559 2.6%
 
2013 201 0.9%
 
2012 170 0.8%
 
2011 130 0.6%
 

lat
Real number (ℝ≥0)

Distinct count5034
Unique (%)23.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.56005251931708
Minimum47.1559
Maximum47.7776
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum47.1559
5-th percentile47.3103
Q147.471
median47.5718
Q347.678
95-th percentile47.74964
Maximum47.7776
Range0.6217
Interquartile range (IQR)0.207

Descriptive statistics

Standard deviation0.1385637102
Coefficient of variation (CV)0.002913447377
Kurtosis-0.6763130016
Mean47.56005252
Median Absolute Deviation (MAD)0.1049
Skewness-0.4852704765
Sum1027915.415
Variance0.0191999018
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
47.6624 17 0.1%
 
47.5322 17 0.1%
 
47.6846 17 0.1%
 
47.5491 17 0.1%
 
47.6955 16 0.1%
 
47.6886 16 0.1%
 
47.6711 16 0.1%
 
47.5402 15 0.1%
 
47.6842 15 0.1%
 
47.6904 15 0.1%
 
Other values (5024) 21452 99.3%
 
ValueCountFrequency (%) 
47.1559 1 < 0.1%
 
47.1593 1 < 0.1%
 
47.1622 1 < 0.1%
 
47.1647 1 < 0.1%
 
47.1764 1 < 0.1%
 
ValueCountFrequency (%) 
47.7776 3 < 0.1%
 
47.7775 3 < 0.1%
 
47.7774 1 < 0.1%
 
47.7772 3 < 0.1%
 
47.7771 2 < 0.1%
 

long
Real number (ℝ)

Distinct count752
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-122.21389640494147
Minimum-122.519
Maximum-121.315
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum-122.519
5-th percentile-122.387
Q1-122.328
median-122.23
Q3-122.125
95-th percentile-121.979
Maximum-121.315
Range1.204
Interquartile range (IQR)0.203

Descriptive statistics

Standard deviation0.1408283424
Coefficient of variation (CV)-0.001152310388
Kurtosis1.049500887
Mean-122.2138964
Median Absolute Deviation (MAD)0.101
Skewness0.8850529834
Sum-2641408.943
Variance0.01983262202
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-122.29 116 0.5%
 
-122.3 111 0.5%
 
-122.362 104 0.5%
 
-122.291 100 0.5%
 
-122.363 99 0.5%
 
-122.372 99 0.5%
 
-122.288 98 0.5%
 
-122.357 96 0.4%
 
-122.284 95 0.4%
 
-122.365 94 0.4%
 
Other values (742) 20601 95.3%
 
ValueCountFrequency (%) 
-122.519 1 < 0.1%
 
-122.515 1 < 0.1%
 
-122.514 1 < 0.1%
 
-122.512 1 < 0.1%
 
-122.511 2 < 0.1%
 
ValueCountFrequency (%) 
-121.315 2 < 0.1%
 
-121.316 1 < 0.1%
 
-121.319 1 < 0.1%
 
-121.321 1 < 0.1%
 
-121.325 1 < 0.1%
 

sqft_living15
Real number (ℝ≥0)

Distinct count777
Unique (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1986.552491556008
Minimum399
Maximum6210
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum399
5-th percentile1140
Q11490
median1840
Q32360
95-th percentile3300
Maximum6210
Range5811
Interquartile range (IQR)870

Descriptive statistics

Standard deviation685.3913043
Coefficient of variation (CV)0.3450154512
Kurtosis1.59709581
Mean1986.552492
Median Absolute Deviation (MAD)410
Skewness1.108181276
Sum42935359
Variance469761.2399
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1540 197 0.9%
 
1440 195 0.9%
 
1560 192 0.9%
 
1500 181 0.8%
 
1460 169 0.8%
 
1580 167 0.8%
 
1610 166 0.8%
 
1720 166 0.8%
 
1800 166 0.8%
 
1620 165 0.8%
 
Other values (767) 19849 91.8%
 
ValueCountFrequency (%) 
399 1 < 0.1%
 
460 2 < 0.1%
 
620 2 < 0.1%
 
670 1 < 0.1%
 
690 2 < 0.1%
 
ValueCountFrequency (%) 
6210 1 < 0.1%
 
6110 1 < 0.1%
 
5790 6 < 0.1%
 
5610 1 < 0.1%
 
5600 1 < 0.1%
 

sqft_lot15
Real number (ℝ≥0)

Distinct count8689
Unique (%)40.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12768.455651691113
Minimum651
Maximum871200
Zeros0
Zeros (%)0.0%
Memory size169.0 KiB

Quantile statistics

Minimum651
5-th percentile1999.2
Q15100
median7620
Q310083
95-th percentile37062.8
Maximum871200
Range870549
Interquartile range (IQR)4983

Descriptive statistics

Standard deviation27304.17963
Coefficient of variation (CV)2.138408933
Kurtosis150.76311
Mean12768.45565
Median Absolute Deviation (MAD)2505
Skewness9.506743247
Sum275964632
Variance745518225.3
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5000 427 2.0%
 
4000 357 1.7%
 
6000 289 1.3%
 
7200 211 1.0%
 
4800 145 0.7%
 
7500 142 0.7%
 
8400 116 0.5%
 
3600 111 0.5%
 
4500 111 0.5%
 
5100 109 0.5%
 
Other values (8679) 19595 90.7%
 
ValueCountFrequency (%) 
651 1 < 0.1%
 
659 1 < 0.1%
 
660 1 < 0.1%
 
748 2 < 0.1%
 
750 4 < 0.1%
 
ValueCountFrequency (%) 
871200 1 < 0.1%
 
858132 1 < 0.1%
 
560617 1 < 0.1%
 
438213 1 < 0.1%
 
434728 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

pricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditiongradesqft_abovesqft_basementyr_builtlatlongsqft_living15sqft_lot15
0221900.031118056501.0003711800195547.5112-122.25713405650
1538000.032257072422.000372170400195147.7210-122.31916907639
2180000.021770100001.000367700193347.7379-122.23327208062
3604000.043196050001.000571050910196547.5208-122.39313605000
4510000.032168080801.0003816800198747.6168-122.04518007503
51225000.04454201019301.00031138901530200147.6561-122.0054760101930
6257500.032171568192.0003717150199547.3097-122.32722386819
7291850.031106097111.0003710600196347.4095-122.31516509711
8229500.031178074701.000371050730196047.5123-122.33717808113
9323000.032189065602.0003718900200347.3684-122.03123907570

Last rows

pricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditiongradesqft_abovesqft_basementyr_builtlatlongsqft_living15sqft_lot15
21603507250.032227055362.0003822700200347.5389-121.88122705731
21604429000.032149011263.0003814900201447.5699-122.28814001230
21605610685.042252060232.0003925200201447.5137-122.16725206023
216061007500.043351072002.000392600910200947.5537-122.39820506200
21607475000.032131012942.000381180130200847.5773-122.40913301265
21608360000.032153011313.0003815300200947.6993-122.34615301509
21609400000.042231058132.0003823100201447.5107-122.36218307200
21610402101.020102013502.0003710200200947.5944-122.29910202007
21611400000.032160023882.0003816000200447.5345-122.06914101287
21612325000.020102010762.0003710200200847.5941-122.29910201357

Duplicate rows

Most frequent

pricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditiongradesqft_abovesqft_basementyr_builtlatlongsqft_living15sqft_lot15count
0259950.02210706492.00039720350200847.5213-122.35710709282
1529500.03214109053.0003914100201447.5818-122.402151013522
2550000.041241084472.003482060350193647.6499-122.0882520147892
3555000.032194032112.0003819400200947.5644-122.093188030782
4585000.032229050892.0003922900200147.5443-122.172229079842